Search CORE

140 research outputs found

Machine learning enabled query re-optimization algorithms for cloud database systems

Author: Wang Chenxiao
Publication venue
Publication date: 01/12/2021
Field of study

In cloud database systems, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer to obtain an optimal query execution plan (QEP) for a query based on the data statistics collected before the query execution. In order to optimize a query with a more accurate cost estimation to achieve such a QEP, performing query re-optimizations during the query execution has been proposed in the literature. However, some of the re-optimizations may not provide any gain in terms of query response time or monetary cost and may also have negative impacts on the query performance due to their overheads. This raises the question of how to determine when a re-optimization is beneficial. In addition, a Service Level Agreement (SLA) is signed between users and the cloud. Thus, query re-optimization is multi-objective optimization that minimizes not only query execution time and monetary cost but also SLA violation. However, none of the existing query re-optimization algorithms considers all these three objectives together and none of them can predict when a re-optimization is beneficial. To fill the gap, in this dissertation, four novel query re-optimization algorithms, ReOpt, ReOptML, ReOptRL and SLAReOptRL are proposed. Extensive theoretical and experimental evaluations performed on our proposed techniques showed that each of them has better performance in terms of time, monetary cost, and SLA violation rate than state-of-the-art techniques when applied to the TPC-H database benchmark

SHAREOK repository

Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs

Author: Wang Jiahua
Wu Qitian
Yan Junchi
Yang Chenxiao
Publication venue
Publication date: 05/06/2023
Field of study

Graph neural networks (GNNs), as the de-facto model class for representation learning on graphs, are built upon the multi-layer perceptrons (MLP) architecture with additional message passing layers to allow features to flow across nodes. While conventional wisdom commonly attributes the success of GNNs to their advanced expressivity, we conjecture that this is not the main cause of GNNs' superiority in node-level prediction tasks. This paper pinpoints the major source of GNNs' performance gain to their intrinsic generalization capability, by introducing an intermediate model class dubbed as P(ropagational)MLP, which is identical to standard MLP in training, but then adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training. This finding sheds new insights into understanding the learning behavior of GNNs, and can be used as an analytic tool for dissecting various GNN-related research problems. As an initial step to analyze the inherent generalizability of GNNs, we show the essential difference between MLP and PMLP at infinite-width limit lies in the NTK feature map in the post-training stage. Moreover, by examining their extrapolation behavior, we find that though many GNNs and their PMLP counterparts cannot extrapolate non-linear functions for extremely out-of-distribution samples, they have greater potential to generalize to testing samples near the training data range as natural advantages of GNN architectures.Comment: Accepted to ICLR 2023. Codes in https://github.com/chr26195/PML

arXiv.org e-Print Archive

A Vision of a Decisional Model for Re-optimizing Query Execution Plans Based on Machine Learning Techniques

Author: Arani Zachary
D&apos
Gruenwald Le
Wang Chenxiao
Publication venue: HAL CCSD
Publication date
Field of study

International audienceMany of the existing cloud database query optimization algorithms target reducing the monetary cost paid to cloud service providers in addition to query response time. These query optimization algorithms rely on an accurate cost estimation so that the optimal query execution plan (QEP) is selected. The cloud environment is dynamic, meaning the hardware configuration, data usage, and workload allocations are continuously changing. These dynamic changes make an accurate query cost estimation difficult to obtain. Concurrently, the query execution plan must be adjusted automatically to address these changes. In order to optimize the QEP with a more accurate cost estimation, the query needs to be optimized multiple times during execution. On top of this, the most updated estimation should be used for each optimization. However, issues arise when deciding to pause the execution for minimum overhead. In this paper, we present our vision of a method that uses machine learning techniques to predict the best timings for optimization during execution

Ion Exchange Membranes for Electrodialysis: A Comprehensive Review of Recent Advances

Author: Hossain Md. Masem
Jiang Chenxiao
Li Yan
Wang Yaoming
Xu Tongwen
Publication venue: 'Lifescience Global'
Publication date: 03/12/2014
Field of study

Electrodialysis related processes are effectively applied in desalination of sea and brackish water, waste water treatment, chemical process industry, and food and pharmaceutical industry. In this process, fundamental component is the ion exchange membrane (IEM), which allows the selective transport of ions. The evolvement of an IEM not only makes the process cleaner and energy-efficient but also recovers useful effluents that are now going to wastes. However ion-exchange membranes with better selectivity, less electrical resistance, good chemical, mechanical and thermal stability are appropriate for these processes. For the development of new IEMs, a lot of tactics have been applied in the last two decades. The intention of this paper is to briefly review synthetic aspects in the development of new ion-exchange membranes and their applications for electrodialysis related processes

Publication Management System

SLA-Aware Cloud Query Processing with Reinforcement Learning-based Multi-Objective Re-Optimization

Author: d'Orazio Laurent
Gruenwald Le
Wang Chenxiao
Publication venue: HAL CCSD
Publication date: 28/08/2023
Field of study

International audienceQuery processing on cloud database systems is a challenging problem due to the dynamic cloud environment. In cloud database systems, besides query execution time, users also consider the monetary cost to be paid to the cloud provider for executing queries. Moreover, a Service Level Agreement (SLA) is signed between users and cloud providers before any service is provided. Thus, from the profit-oriented perspective for the cloud providers, query re-optimization is multi-objective optimization that minimizes not only query execution time and monetary cost but also SLA violations. In this paper, we introduce ReOptRL and SLAReOptRL, two novel query re-optimization algorithms based on deep reinforcement learning. Experiments show that both algorithms improve query execution time and query execution monetary cost by 50% over existing algorithms, and SLAReOptRL has the lowest SLA violation rate among all the algorithms

INRIA a CCSD electronic archive server

Recommended from our members

Diagnosis and Prognosis Using Machine Learning Trained on Brain Morphometry and White Matter Connectomes

Author: Cha Jiook
Kim Hyoung Seop
Kim Jong Hun
Lee Seonjoo
Park Ji-Hwan
Stern Yaakov
Wang Yun
Xu Chenxiao
Yoo Shinjae
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Accurate, reliable prediction of risk for Alzheimer’s disease (AD) is essential for early, diseasemodifying therapeutics. Multimodal MRI, such as structural and diffusion MRI, is likely to contain complementary information of neurodegenerative processes in AD. Here we tested the utility of commonly available multimodal MRI (T1-weighted structure and diffusion MRI), combined with high-throughput brain phenotyping—morphometry and connectomics—and machine learning, as a diagnostic tool for AD. We used, firstly, a clinical cohort at a dementia clinic (study 1: Ilsan Dementia Cohort; N=211; 110 AD, 64 mild cognitive impairment [MCI], and 37 subjective memory complaints [SMC]) to test and validate the diagnostic models; and, secondly, Alzheimer’s Disease Neuroimaging Initiative (ADNI)-2 (study 2) to test the generalizability of the approach and the prognostic models with longitudinal follow up data. Our machine learning models trained on the morphometric and connectome estimates (number of features=34,646) showed optimal classification accuracy (AD/SMC: 97% accuracy, MCI/SMC: 83% accuracy; AD/MCI: 97% accuracy) with iterative nested cross-validation in a single-site study, outperforming the benchmark model (FLAIR-based white matter hyperintensity volumes). In a generalizability study using ADNI-2, the combined connectome and morphometry model showed similar or superior accuracies (AD/HC: 96%; MCI/HC: 70%; AD/MCI: 75% accuracy) as CSF biomarker model (t-tau, p-tau, and Amyloid β, and ratios). We also predicted MCI to AD progression with 69% accuracy, compared with the 70% accuracy using CSF biomarker model. The optimal classification accuracy in a single-site dataset and the reproduced results in multisite dataset show the feasibility of the high-throughput imaging analysis of multimodal MRI and data-driven machine learning for predictive modeling in AD

Columbia University Academic Commons

A Scored Semantic Cache Replacement Strategy for Mobile Cloud Database Systems

Author: Arani Zachary
Basiuk Taras
Chapman Drake
d'Orazio Laurent
Gruenwald Le
Wang Chenxiao
Publication venue: HAL CCSD
Publication date: 25/08/2020
Field of study

International audienceCurrent mobile cloud database systems are widespread and require special considerations for mobile devices. Although many systems rely on numerous metrics for use and optimization, few systems leverage metrics for decisional cache replacement on the mobile device. In this paper we introduce the Lowest Scored Replacement (LSR) policy-a novel cache replacement policy based on a predefined score which leverages contextual mobile data and user preferences for decisional replacement. We show an implementation of the policy using our previously proposed MOCCAD-Cache as our decisional semantic cache and our Normalized Weighted Sum Algorithm (NWSA) as a score basis. Our score normalization is based on the factors of query response time, energy spent on mobile device, and monetary cost to be paid to a cloud provider. We then demonstrate a relevant scenario for LSR, where it excels in comparison to the Least Recently Used (LRU) and Least Frequently Used (LFU) cache replacement policies

INRIA a CCSD electronic archive server